Audio-Visual Speaker Recognition for Video Broadcast News
نویسندگان
چکیده
Signi cant progress has been made in the transcription of the audio stream in the broadcast news domain for both radio news and TV news (HUB4 task). Such transcripts provide an excellent means of indexing video content for search and retrieval. Speaker identi cation is an important technology in this domain both for selecting high-accuracy speaker-dependent models for transcription and as an index for search and retrieval of video content. However, the transcription accuracy under acoustically degraded conditions (such as background noise) and channel mismatch (telephone) still needs further improvements. To make improvements in such degraded conditions is a hard problem. We have begun investigating the combination of audiobased processing with visual processing for both speech and speaker recognition to improve the accuracy in acoustically degraded conditions. The use of two independent sources of information brings signi cantly increased robustness to signal degradation since degradations in the two channels are uncorrelated, and the use of visual information allows a much faster speaker identi cation than possible with acoustic information. In this paper, we present some encouraging preliminary results for audio-visual speaker recognition for TV broadcast news data (CNN).
منابع مشابه
Audio-visual speaker recognition for video broadcast news: some fusion techniques
Audio-based speaker identi cation degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identi cation with audio-based speaker identi cation to improve the performance under mismatched conditions. Speci cally, we explore techniques to optimally determine the relativ...
متن کاملUCBN: A new audio-visual broadcast news corpus for multimodal speaker verification studies
The performance of face, voice, and multimodal speaker verification systems in complex and non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust multi-modal speaker recognition systems, a new multi-modal (audio-visual) Australian broadcast UCBN (University of Canberra Broadcast News) corpus was...
متن کاملDetecting News Reporting Using Audio/Visual Information
This paper proposes an integrated approach to discriminate news reporting from everything else in broadcast news data based on both audio and visual information. The separation of news reporting segments from others not only can provide useful indices for video streams but also serves as a pre-processing step for tasks such as speaker identi cation and speech recognition so that only speech seg...
متن کاملVarious Methods for Visual Speaker Identification for Automatic Continuous Speech Recognition in TV Broadcast Programs
This paper is about different methods and algorithms that were used for speaker identification from the video recordings of TV broadcast news transcription. The information from visual speaker identification were used in our complex system for automatic continuous speech recognition of TV broadcast programs because it is possible to use speaker adapted (SA) Hidden Markov Models (HMMs) if we hav...
متن کاملInformation Access using Speech, Speaker and Face Recognition
We describe a scheme to combine the results of audio and face identification for multimedia indexing and retrieval. Audio analysis consists of speech and speaker recognition derived from broadcast news video clip. The video component is analyzed to identify the persons in the same video clip using face recognition. When applied individually both speaker and face recognition schemes have limitat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- VLSI Signal Processing
دوره 29 شماره
صفحات -
تاریخ انتشار 2001